Robustifying models against adversarial attacks by Langevin dynamics

نویسندگان

چکیده

Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, been circumvented by newer and more sophisticated attacking strategies. In the midst this ensuing arms race, problem robustness against adversarial still remains challenging task. This paper proposes novel, simple yet effective strategy where off-manifold samples are driven towards high density regions data generating distribution (unknown) target class Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve task, we introduce generative model conditional inputs given labels that can be learned through supervised Denoising Autoencoder (sDAE) in alignment discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped significant dispersion—projection distributed broadly. prevents white box from accurately aligning input to create an sample effectively. MALADE applicable any existing classifier, providing robust as well detection. our experiments, exhibited state-of-the-art various elaborate

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defense-gan: Protecting Classifiers against Adversarial Attacks Using Generative Models

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend de...

متن کامل

Divide, Denoise, and Defend against Adversarial Attacks

Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch...

متن کامل

Defending Non-Bayesian Learning against Adversarial Attacks

Abstract This paper addresses the problem of non-Bayesian learning over multi-agent networks, where agents repeatedly collect partially informative observations about an unknown state of the world, and try to collaboratively learn the true state. We focus on the impact of the adversarial agents on the performance of consensus-based non-Bayesian learning, where non-faulty agents combine local le...

متن کامل

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class proba...

متن کامل

Protecting JPEG Images Against Adversarial Attacks

As deep neural networks (DNNs) have been integrated into critical systems, several methods to attack these systems have been developed. These adversarial attacks make imperceptible modifications to an image that fool DNN classifiers. We present an adaptive JPEG encoder which defends against many of these attacks. Experimentally, we show that our method produces images with high visual quality w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Networks

سال: 2021

ISSN: ['1879-2782', '0893-6080']

DOI: https://doi.org/10.1016/j.neunet.2020.12.024